18 research outputs found

    Moments Of Genome Evolution By Double Cut-and-join

    Get PDF
    Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)We study statistical estimators of the number of genomic events separating two genomes under a Double Cut-and Join (DCJ) rearrangement model, by a method of moment estimation. We first propose an exact, closed, analytically invertible formula for the expected number of breakpoints after a given number of DCJs. This improves over the heuristic, recursive and computationally slower previously proposed one. Then we explore the analogies of genome evolution by DCJ with evolution of binary sequences under substitutions, permutations under transpositions, and random graphs. Each of these are presented in the literature with intuitive justifications, and are used to import results from better known fields. We formalize the relations by proving a correspondence between moments in sequence and genome evolution, provided substitutions appear four by four in the corresponding model. Eventually we prove a bounded error on two estimators of the number of cycles in the breakpoint graph after a given number of rearrangements, by an analogy with cycles in permutations and components in random graphs.1614Agence Nationale pour la Recherche, Ancestrome project [ANR-10-BINF-01-01]Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)FAPESP [2013/25084-2

    Breaking Good: Accounting For Fragility Of Genomic Regions In Rearrangement Distance Estimation

    Get PDF
    Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)Models of evolution by genome rearrangements are prone to two types of flaws: One is to ignore the diversity of susceptibility to breakage across genomic regions, and the other is to suppose that susceptibility values are given. Without necessarily supposing their precise localization, we call "solid" the regions that are improbably broken by rearrangements and "fragile" the regions outside solid ones. We propose a model of evolution by inversions where breakage probabilities vary across fragile regions and over time. It contains as a particular case the uniform breakage model on the nucleotidic sequence, where breakage probabilities are proportional to fragile region lengths. This is very different from the frequently used pseudouniform model where all fragile regions have the same probability to break. Estimations of rearrangement distances based on the pseudouniform model completely fail on simulations with the truly uniform model. On pairs of amniote genomes, we show that identifying coding genes with solid regions yields incoherent distance estimations, especially with the pseudouniform model, and to a lesser extent with the truly uniform model. This incoherence is solved when we coestimate the number of fragile regions with the rearrangement distance. The estimated number of fragile regions is surprisingly Small, suggesting that a minority of regions are recurrently used by rearrangements. Estimations for several pairs of genomes at different divergence times are in agreement with a slowly evolvable colocalization of active genomic regions in the cell.8514271439FAPESP [2013/25084-2]French Agence Nationale de la Recherche (ANR) [ANR-10-BINF-01-01]ICT FP7 european programme EVOEVOFundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP

    Breaking Good: Accounting for Fragility of Genomic Regions in Rearrangement Distance Estimation

    Get PDF
    International audienceModels of evolution by genome rearrangements are prone to two types of flaws: One is to ignore the diversity of susceptibility tobreakage across genomic regions, and the other is to suppose that susceptibility values are given. Without necessarily supposing theirprecise localization,we call “solid” the regions that are improbably broken by rearrangements and “fragile” the regions outside solidones.We propose a model of evolution by inversions where breakage probabilities vary across fragile regions and over time. It containsas a particular case the uniform breakage model on the nucleotidic sequence,where breakage probabilities are proportional to fragileregion lengths. This is very different from the frequently used pseudo uniform model where all fragile regions have the same probabilityto break. Estimations of rearrangement distances based on the pseudo uniform model completely fail on simulations with thetruly uniform model. On pairs of amniote genomes, we show that identifying coding genes with solid regions yields incoherentdistance estimations, especially with the pseudo uniform model, and to a lesser extent with the truly uniform model. This incoherenceis solved when we coestimate the number of fragile regions with the rearrangement distance. The estimated number of fragileregions is surprisingly small, suggesting that a minority of regions are recurrently used by rearrangements. Estimations for several pairsof genomes at different divergence times are in agreement with a slowly evolvable colocalization of active genomic regions in the cell

    Moments of genome evolution by Double Cut-and-Join

    Get PDF
    International audienceWe study statistical estimators of the number of genomic events separating two genomes under a Double Cut-and Join (DCJ) rearrangement model, by a method of moment estimation. We first propose an exact, closed, analytically invertible formula for the expected number of breakpoints after a given number of DCJs. This improves over the heuristic, recursive and computationally slower previously proposed one. Then we explore the analogies of genome evolution by DCJ with evolution of binary sequences under substitutions, permutations under transpositions, and random graphs. Each of these are presented in the literature with intuitive justifications, and are used to import results from better known fields. We formalize the relations by proving a correspondence between moments in sequence and genome evolution, provided substitutions appear four by four in the corresponding model. Eventually we prove a bounded error on two estimators of the number of cycles in the breakpoint graph after a given number of rearrangements, by an analogy with cycles in permutations and components in random graphs

    Comparative genomics on artificial life

    No full text
    Molecular evolutionary methods and tools are difficult to validate as we have almost no direct access to ancient molecules. Inference methods may be tested with simulated data, producing full scenarios they can be compared with. But often simulations design is concomitant with the design of a particular method, developed by a same team, based on the same assumptions, when both should be blind to each other. In silico experimental evolution consists in evolving digital organisms with the aim of testing or discovering complex evolutionary processes. Models were not designed with a particular inference method in mind, only with basic biological principles. As such they provide a unique opportunity to blind test the behavior of inference methods. We give a proof of this concept on a comparative genomics problem: inferring the number of inversions separating two genomes. We use Aevol, an in silico experimental evolution platform, to produce benchmarks, and show that most combinatorial or statistical estimators of the number of inversions fail on this dataset while they were behaving perfectly on ad-hoc simulations. We argue that biological data is probably closer to the difficult situation9709354412th Conference on Computability in Europe (CiE)16-06-27Paris, Françasem informaçã

    Comparative Genomics on Artificial Life

    No full text
    International audienc

    Median approximations for genomes modeled as matrices

    No full text
    Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)The genome median problem is an important problem in phylogenetic reconstruction under rearrangement models. It can be stated as follows: Given three genomes, find a fourth that minimizes the sum of the pairwise rearrangement distances between it and the three input genomes. In this paper, we model genomes as matrices and study the matrix median problem using the rank distance. It is known that, for any metric distance, at least one of the corners is a 4/3-approximation of the median. Our results allow us to compute up to three additional matrix median candidates, all of them with approximation ratios at least as good as the best corner, when the input matrices come from genomes. We also show a class of instances where our candidates are optimal. From the application point of view, it is usually more interesting to locate medians farther from the corners, and therefore, these new candidates are potentially more useful. In addition to the approximation algorithm, we suggest a heuristic to get a genome from an arbitrary square matrix. This is useful to translate the results of our median approximation algorithm back to genomes, and it has good results in our tests. To assess the relevance of our approach in the biological context, we ran simulated evolution tests and compared our solutions to those of an exact DCJ median solver. The results show that our method is capable of producing very good candidates.The genome median problem is an important problem in phylogenetic reconstruction under rearrangement models. It can be stated as follows: Given three genomes, find a fourth that minimizes the sum of the pairwise rearrangement distances between it and the784786814FAPESP - FUNDAÇÃO DE AMPARO À PESQUISA DO ESTADO DE SÃO PAULOFundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)2012/13865-7; 2012/14104-
    corecore